图表卷积网络(GCNS)广泛应用于许多应用,但仍需要大量标记的培训数据。此外,GCNS的邻接矩阵是稳定的,这使得数据处理策略无法有效地调整来自内置的图形结构的训练数据的数量。从本文中进一步提高了GCN的性能和自学能力,我们提出在一个区域(rrlfsor).rrlfsor上的有效删除的gcns的高效自我监督的GCNS的学习策略(RRLFSOR).rrlfsor可以被视为新的数据增强器来改进过度平滑。在两个有效和代表性的GCN模型上检查了rrlfsor使用三个公开引文数据集 - 科拉,Pubmed和CiteSeer。转换链路预测任务的实验表明,在三个基准数据集的准确性方面,我们的策略始终如一地始终如一的基线模型。
translated by 谷歌翻译
The previous fine-grained datasets mainly focus on classification and are often captured in a controlled setup, with the camera focusing on the objects. We introduce the first Fine-Grained Vehicle Detection (FGVD) dataset in the wild, captured from a moving camera mounted on a car. It contains 5502 scene images with 210 unique fine-grained labels of multiple vehicle types organized in a three-level hierarchy. While previous classification datasets also include makes for different kinds of cars, the FGVD dataset introduces new class labels for categorizing two-wheelers, autorickshaws, and trucks. The FGVD dataset is challenging as it has vehicles in complex traffic scenarios with intra-class and inter-class variations in types, scale, pose, occlusion, and lighting conditions. The current object detectors like yolov5 and faster RCNN perform poorly on our dataset due to a lack of hierarchical modeling. Along with providing baseline results for existing object detectors on FGVD Dataset, we also present the results of a combination of an existing detector and the recent Hierarchical Residual Network (HRN) classifier for the FGVD task. Finally, we show that FGVD vehicle images are the most challenging to classify among the fine-grained datasets.
translated by 谷歌翻译
We present, Naamapadam, the largest publicly available Named Entity Recognition (NER) dataset for the 11 major Indian languages from two language families. In each language, it contains more than 400k sentences annotated with a total of at least 100k entities from three standard entity categories (Person, Location and Organization) for 9 out of the 11 languages. The training dataset has been automatically created from the Samanantar parallel corpus by projecting automatically tagged entities from an English sentence to the corresponding Indian language sentence. We also create manually annotated testsets for 8 languages containing approximately 1000 sentences per language. We demonstrate the utility of the obtained dataset on existing testsets and the Naamapadam-test data for 8 Indic languages. We also release IndicNER, a multilingual mBERT model fine-tuned on the Naamapadam training set. IndicNER achieves the best F1 on the Naamapadam-test set compared to an mBERT model fine-tuned on existing datasets. IndicNER achieves an F1 score of more than 80 for 7 out of 11 Indic languages. The dataset and models are available under open-source licenses at https://ai4bharat.iitm.ac.in/naamapadam.
translated by 谷歌翻译
Non-parametric tests can determine the better of two stochastic optimization algorithms when benchmarking results are ordinal, like the final fitness values of multiple trials. For many benchmarks, however, a trial can also terminate once it reaches a pre-specified target value. When only some trials reach the target value, two variables characterize a trial's outcome: the time it takes to reach the target value (or not) and its final fitness value. This paper describes a simple way to impose linear order on this two-variable trial data set so that traditional non-parametric methods can determine the better algorithm when neither dominates. We illustrate the method with the Mann-Whitney U-test. A simulation demonstrates that U-scores are much more effective than dominance when tasked with identifying the better of two algorithms. We test U-scores by having them determine the winners of the CEC 2022 Special Session and Competition on Real-Parameter Numerical Optimization.
translated by 谷歌翻译
Of late, insurance fraud detection has assumed immense significance owing to the huge financial & reputational losses fraud entails and the phenomenal success of the fraud detection techniques. Insurance is majorly divided into two categories: (i) Life and (ii) Non-life. Non-life insurance in turn includes health insurance and auto insurance among other things. In either of the categories, the fraud detection techniques should be designed in such a way that they capture as many fraudulent transactions as possible. Owing to the rarity of fraudulent transactions, in this paper, we propose a chaotic variational autoencoder (C-VAE to perform one-class classification (OCC) on genuine transactions. Here, we employed the logistic chaotic map to generate random noise in the latent space. The effectiveness of C-VAE is demonstrated on the health insurance fraud and auto insurance datasets. We considered vanilla Variational Auto Encoder (VAE) as the baseline. It is observed that C-VAE outperformed VAE in both datasets. C-VAE achieved a classification rate of 77.9% and 87.25% in health and automobile insurance datasets respectively. Further, the t-test conducted at 1% level of significance and 18 degrees of freedom infers that C-VAE is statistically significant than the VAE.
translated by 谷歌翻译
The promise of Mobile Health (mHealth) is the ability to use wearable sensors to monitor participant physiology at high frequencies during daily life to enable temporally-precise health interventions. However, a major challenge is frequent missing data. Despite a rich imputation literature, existing techniques are ineffective for the pulsative signals which comprise many mHealth applications, and a lack of available datasets has stymied progress. We address this gap with PulseImpute, the first large-scale pulsative signal imputation challenge which includes realistic mHealth missingness models, an extensive set of baselines, and clinically-relevant downstream tasks. Our baseline models include a novel transformer-based architecture designed to exploit the structure of pulsative signals. We hope that PulseImpute will enable the ML community to tackle this significant and challenging task.
translated by 谷歌翻译
Deep learning based text-to-speech (TTS) systems have been evolving rapidly with advances in model architectures, training methodologies, and generalization across speakers and languages. However, these advances have not been thoroughly investigated for Indian language speech synthesis. Such investigation is computationally expensive given the number and diversity of Indian languages, relatively lower resource availability, and the diverse set of advances in neural TTS that remain untested. In this paper, we evaluate the choice of acoustic models, vocoders, supplementary loss functions, training schedules, and speaker and language diversity for Dravidian and Indo-Aryan languages. Based on this, we identify monolingual models with FastPitch and HiFi-GAN V1, trained jointly on male and female speakers to perform the best. With this setup, we train and evaluate TTS models for 13 languages and find our models to significantly improve upon existing models in all languages as measured by mean opinion scores. We open-source all models on the Bhashini platform.
translated by 谷歌翻译
超越地球轨道的人类空间勘探将涉及大量距离和持续时间的任务。为了有效减轻无数空间健康危害,数据和空间健康系统的范式转移是实现地球独立性的,而不是Earth-Reliance所必需的。有希望在生物学和健康的人工智能和机器学习领域的发展可以解决这些需求。我们提出了一个适当的自主和智能精密空间健康系统,可以监控,汇总和评估生物医学状态;分析和预测个性化不良健康结果;适应并响应新累积的数据;并提供对其船员医务人员的个人深度空间机组人员和迭代决策支持的预防性,可操作和及时的见解。在这里,我们介绍了美国国家航空航天局组织的研讨会的建议摘要,以便在太空生物学和健康中未来的人工智能应用。在未来十年,生物监测技术,生物标志科学,航天器硬件,智能软件和简化的数据管理必须成熟,并编织成精确的空间健康系统,以使人类在深空中茁壮成长。
translated by 谷歌翻译
空间生物学研究旨在了解太空飞行对生物的根本影响,制定支持深度空间探索的基础知识,最终生物工程航天器和栖息地稳定植物,农作物,微生物,动物和人类的生态系统,为持续的多行星寿命稳定。要提高这些目标,该领域利用了来自星空和地下模拟研究的实验,平台,数据和模型生物。由于研究扩展到低地球轨道之外,实验和平台必须是最大自主,光,敏捷和智能化,以加快知识发现。在这里,我们介绍了由美国国家航空航天局的人工智能,机器学习和建模应用程序组织的研讨会的建议摘要,这些应用程序为这些空间生物学挑战提供了关键解决方案。在未来十年中,将人工智能融入太空生物学领域将深化天空效应的生物学理解,促进预测性建模和分析,支持最大自主和可重复的实验,并有效地管理星载数据和元数据,所有目标使生活能够在深空中茁壮成长。
translated by 谷歌翻译
我们概述了新兴机会和挑战,以提高AI对科学发现的效用。AI为行业的独特目标与AI科学的目标创造了识别模式中的识别模式与来自数据的发现模式之间的紧张。如果我们解决了与域驱动的科学模型和数据驱动的AI学习机之间的“弥补差距”相关的根本挑战,那么我们预计这些AI模型可以改变假说发电,科学发现和科学过程本身。
translated by 谷歌翻译